DL 11 - Best Practices(Python)
Loading...

Best Practices

Scaling Deep Learning Best Practices

  • Use a GPU
  • Early Stopping
  • Larger batch size + learning rate
  • Use Petastorm
  • Use Multiple GPUs with Horovod

ULMFiT - Language Model Fine-tuning

  • Discriminative Fine Tuning: Different LR per layer
  • Slanted triangular learning rates: Linearly increase learning rate, followed by linear decrease in learning rate
  • Gradual Unfreezing: Unfreeze last layer and train for one epoch and keep unfreezing layers until all layers trained/terminal layer

Bag of Tricks for CNN

  • Use Xavier Initalization
  • Learning rate warmup (start with low LR and change to a higher LR)
  • Increase learning rate for larger batch size
  • No regularization on bias/no weight decay for bias terms
  • Knowledge Distillation: Use a more complex model to train a smaller model by adjusting the loss to include difference in softmax values between the more accurate and smaller model
  • Label Smoothing: Adjust labels so that softmax output will have probability 1 - ε for the correct class and ε/(K − 1) for the incorrect class, K is the number of labels
  • Image Augmentation:
    • Random crops of rectangular areas in image
    • Random flips
    • Adjust hue, saturation, brightness
    • Add PCA noise with a coefficient sampled from a normal distribution

fast.ai best practices

  • Do as much of your work as you can on a small sample of the data
  • Batch normalization works best when done after ReLU
  • Data Augmentation: Use the right kind of augmentation (e.g. don't flip a cat upside down, but satellite image OK)